Exploring Class Overlap in Classification Challenges: Introducing the R Package

useR! 2024, Salzburg, Austria

Priyanga Dilini Talagala

July 9, 2024

Data Quality Problems

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

    • Noise: Variability or errors in data collection

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

    • Noise: Variability or errors in data collection

    • Feature Representation: Insufficient or inadequate features to separate classes

Class Overlapping Problem

  • Class overlap occurs when instances of more than one class share a common region in the data space and are not clearly separable

  • This overlap can happen due to:

    • Inherent Similarity: Natural similarity between classes

    • Noise: Variability or errors in data collection

    • Feature Representation: Insufficient or inadequate features to separate classes

  • It makes it challenging for classifiers to accurately distinguish between classes

Implications of Class Overlap

  • Classifiers struggle to correctly classify instances due to overlapping regions

Implications of Class Overlap

  • Classifiers struggle to correctly classify instances due to overlapping regions

  • Higher error rates occur in areas where classes overlap, leading to more instances being misclassified

Implications of Class Overlap

  • Classifiers struggle to correctly classify instances due to overlapping regions

  • Higher error rates occur in areas where classes overlap, leading to more instances being misclassified

  • If the problem of class overlap is not addressed, models may become overly complex, leading to overfitting issues where the model performs well on training data but poorly on unseen data